Repairing Bengali Verb Chunks for Improved Bengali to Hindi Machine Translation
نویسندگان
چکیده
The present paper identifies the mistakes made by a data driven Bengali chunker. The analysis of a chunk based machine translation output shows that the major classes of errors are generated from the verb chunk identification mistakes. Therefore, based on the analysis of the types of mistakes in the Bengali verb chunk identification we propose some modules. These modules use tables of manually created entries which are validated using chunk annotated and dependency annotated corpus. These modules are used to repair the Bengali verb chunks and subsequently to improve the quality of Bengali to Hindi transfer based machine translation system.
منابع مشابه
Translations of Ambiguous Hindi Pronouns to Possible Bengali Pronouns
In a Hindi to Bengali transfer based machine translation system the baseline lexical transfer module replaces a Hindi word by its most frequent Bengali translation. Some pronouns in Hindi can have multiple translations in Bengali. The choices of actual translations have big impact on the accessibility of the translated sentence. The list of Hindi pronouns is small and their corresponding Bengal...
متن کاملWord Sense Disambiguation in Bengali applied to Bengali-Hindi Machine Translation
We have developed a word sense disambiguation(WSD) system for Bengali language and applied the system to get correct lexical choice in Bengali-Hindi machine translation. We are not aware of any existing system for Bengali WSD. Since there is no sense annotated Bengali corpus or sufficient amount of parallel corpus for Bengali-Hindi language pair, we had to use an unsupervised approach. We use a...
متن کاملBengali and Hindi to English Cross-language Text Retrieval under Limited Resources
This paper describes our experiment on two cross-lingual and one monolingual English text retrievals at CLEF in the ad-hoc track. The cross-language task includes the retrieval of English documents in response to queries in two most widely spoken Indian languages, Hindi and Bengali. For our experiment, we had access to a HindiEnglish bilingual lexicon, ’Shabdanjali’, consisting of approx. 26K H...
متن کاملNamed Entity Recognition using Support Vector Machine: A Language Independent Approach
Named Entity Recognition (NER) aims to classify each word of a document into predefined target named entity classes and is now-a-days considered to be fundamental for many Natural Language Processing (NLP) tasks such as information retrieval, machine translation, information extraction, question answering systems and others. This paper reports about the development of a NER system for Bengali a...
متن کاملHandling Plurality in Bengali Noun Phrases
Plurality of a Bengali Noun Phrase (NP) is not always determined by the plurality of its governing member (or the head). It is often seen that an NP is plural but the plurality is indicated through qualifiers or other means whereas the head noun has the singular form. In such scenarios, the plurality of the NP is determined by analyzing its non-head members or from other components (or context)...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012